Lab2: Regularization
Ridge, LASSO, and Elastic-net Regression
Prostate Cancer Data
lpsa: log(prostate-specific antigen)lcavol: log(cancer volume)lweight: log(prostate weight)age: agelbph: log(amount of benign prostatic hyperplasia)svi: seminal vesicle invasionlcp: log(capsular penetration)gleason: Gleason scorespgg45: percentage Gleason scores 4 or 5[1] 97 9
lcavol lweight age lbph svi lcp gleason pgg45 lpsa
1 -0.57982 2.7695 50 -1.3863 0 -1.3863 6 0 -0.43078
2 -0.99425 3.3196 58 -1.3863 0 -1.3863 6 0 -0.16252
3 -0.51083 2.6912 74 -1.3863 0 -1.3863 7 20 -0.16252
4 -1.20397 3.2828 58 -1.3863 0 -1.3863 6 0 -0.16252
5 0.75142 3.4324 62 -1.3863 0 -1.3863 6 0 0.37156
6 -1.04982 3.2288 50 -1.3863 0 -1.3863 6 0 0.76547
'data.frame': 97 obs. of 9 variables:
$ lcavol : num -0.58 -0.994 -0.511 -1.204 0.751 ...
$ lweight: num 2.77 3.32 2.69 3.28 3.43 ...
$ age : int 50 58 74 58 62 50 64 58 47 63 ...
$ lbph : num -1.39 -1.39 -1.39 -1.39 -1.39 ...
$ svi : int 0 0 0 0 0 0 0 0 0 0 ...
$ lcp : num -1.39 -1.39 -1.39 -1.39 -1.39 ...
$ gleason: int 6 6 7 6 6 6 6 6 6 6 ...
$ pgg45 : int 0 0 20 0 0 0 0 0 0 0 ...
$ lpsa : num -0.431 -0.163 -0.163 -0.163 0.372 ...
| Formula | Model |
|---|---|
Y ~ 1 |
\(Y = \beta_0 + \epsilon\) |
Y ~ X1 + X2 |
\(Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \epsilon\) |
Y ~ X1 + X2 - 1 |
\(Y = \beta_1X_1 + \beta_2X_2 + \epsilon\) |
Y ~ . |
\(Y = \beta_0 + \beta_1X_1 + \ldots + \beta_pX_p + \epsilon\) |
Y ~ . - Xp |
\(Y = \beta_0 + \beta_1X_1 + \ldots + \beta_{p-1}X_{p-1} + \epsilon\) |
Y ~ X1:X2 |
\(Y = \beta_0 + \beta_{12}X_1X_2 + \epsilon\) |
Y ~ X1 * X2 |
\(Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \beta_{12}X_1X_2 + \epsilon\) |
Y ~ (X1 + X2 + X3)^2 |
\(\begin{aligned} Y = &\beta_0 + \beta_1X_1 + \beta_2X_2 + \beta_3X_3 + \\ &\beta_{12}X_1X_2 + \beta_{13}X_1X_3 + \beta_{23}X_2X_3 + \epsilon \\ &(\color{red}{\beta_{123}X_1X_2X_3} \text{ is NOT contained}) \end{aligned}\) |
Y ~ X1 + I(X1^2) Y ~ poly(X1, 2, raw = TRUE) |
\(Y = \beta_0 + \beta_1X_1 + \beta_2X_1^2 + \epsilon\) |
stats::step()
stepAIC in package MASS for a wider range of object classes.step is a slightly simplified version of stepAIC in package MASS.lm.for <- stepAIC(update(lm.full, . ~ 1),
scope = list(upper = lm.full),
direction = "forward",
trace = 0)
lm.for$anovaStepwise Model Path
Analysis of Deviance Table
Initial Model:
lpsa ~ 1
Final Model:
lpsa ~ lcavol + lweight + svi
Step Df Deviance Resid. Df Resid. Dev AIC
1 76 100.104 22.205
2 + lcavol 1 61.5899 75 38.514 -49.344
3 + lweight 1 4.5414 74 33.973 -57.005
4 + svi 1 2.7310 73 31.242 -61.458
lm.back <- stepAIC(lm.full,
scope = list(lower = ~ 1),
direction = "backward",
trace = 0)
lm.back$anovaStepwise Model Path
Analysis of Deviance Table
Initial Model:
lpsa ~ lcavol + lweight + age + lbph + svi + lcp + gleason +
pgg45
Final Model:
lpsa ~ lcavol + lweight + svi
Step Df Deviance Resid. Df Resid. Dev AIC
1 68 29.719 -55.305
2 - pgg45 1 0.02323 69 29.742 -57.245
3 - lbph 1 0.21344 70 29.956 -58.694
4 - lcp 1 0.22115 71 30.177 -60.128
5 - gleason 1 0.50342 72 30.680 -60.854
6 - age 1 0.56146 73 31.242 -61.458
Stepwise Model Path
Analysis of Deviance Table
Initial Model:
lpsa ~ lcavol + lweight + age + lbph + svi + lcp + gleason +
pgg45
Final Model:
lpsa ~ lcavol + lweight + svi
Step Df Deviance Resid. Df Resid. Dev AIC
1 68 29.719 -55.305
2 - pgg45 1 0.02323 69 29.742 -57.245
3 - lbph 1 0.21344 70 29.956 -58.694
4 - lcp 1 0.22115 71 30.177 -60.128
5 - gleason 1 0.50342 72 30.680 -60.854
6 - age 1 0.56146 73 31.242 -61.458
"lpsa ~ .": Create a design matrix for all variables except for lpsa."[, -1]": Exclude the column of intercepts. (The intercept term is fitted by default in glmnet)Arguments
family:
"gaussian"(default), "binomial", "poisson", "multinomial", "cox", "mgaussian".glm() family object. (See ?family)9 x 5 sparse Matrix of class "dgCMatrix"
s1 s2 s3 s4 s5
(Intercept) -0.56814 -0.0446 1.3459 2.24135 2.4e+00
lcavol 0.48574 0.2744 0.0835 0.01455 2.1e-03
lweight 0.61679 0.4190 0.1317 0.02252 3.2e-03
age -0.01271 -0.0023 0.0015 0.00041 6.2e-05
lbph 0.03367 0.0235 0.0102 0.00192 2.8e-04
svi 0.54370 0.3916 0.1570 0.02952 4.2e-03
lcp 0.00237 0.0796 0.0450 0.00894 1.3e-03
gleason 0.12456 0.0987 0.0549 0.01151 1.7e-03
pgg45 0.00063 0.0018 0.0015 0.00032 4.7e-05
9 x 5 sparse Matrix of class "dgCMatrix"
s1 s2 s3 s4 s5
(Intercept) -0.62039 -0.495 -0.3921 0.42 1.90
lcavol 0.56882 0.553 0.5485 0.53 0.43
lweight 0.64248 0.615 0.5340 0.36 .
age -0.01557 -0.011 -0.0024 . .
lbph 0.03471 0.021 . . .
svi 0.57973 0.499 0.4468 0.30 .
lcp -0.04098 . . . .
gleason 0.12856 0.092 0.0403 . .
pgg45 0.00013 . . . .
Call: cv.glmnet(x = x, y = y, type.measure = "mse", nfolds = 10, alpha = 0)
Measure: Mean-Squared Error
Lambda Index Measure SE Nonzero
min 0.089 100 0.512 0.0677 8
1se 0.760 77 0.573 0.0992 8
9 x 1 sparse Matrix of class "dgCMatrix"
s1
(Intercept) -0.58941632
lcavol 0.51296312
lweight 0.63098503
age -0.01431908
lbph 0.03608876
svi 0.56626022
lcp -0.01742483
gleason 0.12840575
pgg45 0.00064349
Call: cv.glmnet(x = x, y = y, type.measure = "mse", nfolds = 10, alpha = 1)
Measure: Mean-Squared Error
Lambda Index Measure SE Nonzero
min 0.0959 25 0.471 0.0409 3
1se 0.1839 18 0.511 0.0594 3
9 x 1 sparse Matrix of class "dgCMatrix"
s1
(Intercept) 0.12655
lcavol 0.54269
lweight 0.42974
age .
lbph .
svi 0.37431
lcp .
gleason .
pgg45 .
Arguments
type.measure: loss to use for cross-validation.
"default" : MSE for gaussian models, deviance for logistic and poisson regressions, and partial-likelihood for the Cox model."mse"/"mae": Mean squared/absolute error for all models except the “Cox”."deviance": Deviance for logistic and poisson regressions."class": Misclassification error for binomial and multinomial logistic regressions."auc": Area under the ROC curve for two-class logistic regression."C": Harrel’s concordance measure for Cox models.s: Value(s) of the penalty parameter \(\lambda\).
"lambda.1se" (default): Largest value of \(\lambda\) such that error is within 1 standard error of the minimum."lambda.min": Value of \(\lambda\) that gives the minimum mean cross-validated error.Arguments
foldid: an optional vector of values between 1 and nfolds identifying what fold each observation is in. If supplied, nfolds can be missing.Users can explicitly control the fold that each observation is assigned to via the foldid argument. This is useful in using cross-validation to select a value for \(\alpha\).
alpha lambda cvm
343 0.9 0.028968 0.46205
9 x 1 sparse Matrix of class "dgCMatrix"
s0
(Intercept) -0.4937283
lcavol 0.5496797
lweight 0.5977336
age -0.0089177
lbph 0.0136072
svi 0.4857698
lcp .
gleason 0.0803757
pgg45 .
lcavol lweight age lbph svi lcp gleason pgg45
1 1 2 2 3 3 4 4
grplasso <- sparsegl(x = x, y = y, group = grp, asparse = 0)
grplasso.cv <- cv.sparsegl(x = x, y = y, group = grp, asparse = 0, pred.loss = "mse")
grplasso.cv
Call: cv.sparsegl(x = x, y = y, group = grp, pred.loss = "mse", asparse = 0)
Error measure: Mean squared error
lambda index cvm cvsd nnzero active_grps
Max. 7.27e-02 1 1.343 0.2219 0 0
lambda.1se 1.49e-02 18 0.575 0.0950 4 2
lambda.min 1.76e-03 41 0.500 0.0803 6 3
Min. 7.27e-06 100 0.526 0.0934 8 4
Arguments
group: A vector of consecutive integers describing the grouping of the coefficients.asparse: The relative weight to put on the \(\ell_1\)-norm in sparse group lasso.
asparse = 0 gives the group lasso fit; asparse = 1 gives the lasso fit.9 x 1 sparse Matrix of class "dgCMatrix"
s1
(Intercept) 1.757514
lcavol 0.403319
lweight 0.039859
age .
lbph .
svi 0.291706
lcp 0.079903
gleason .
pgg45 .
pred.for <- predict(lm.for, testing)
pred.back <- predict(lm.back, testing)
pred.step <- predict(lm.step, testing)
z <- model.matrix(lpsa ~ ., data = testing)[, -1]
pred.ridge <- predict(ridge.cv, newx = z, s = "lambda.min")
pred.lasso <- predict(lasso.cv, newx = z, s = "lambda.min")
pred.elnet <- predict(elnet, newx = z)
mse <- sapply(mget(ls(pattern = "^pred")), \(x) mean((x - testing$lpsa)^2))